scalar value
On Embeddings for Numerical Features in Tabular Deep Learning
Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with gradient boosted decision trees (GBDT) on some GBDT-friendly benchmarks (that is, where GBDT outperforms conventional DL models). We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.
LARGO: Low-Rank Regulated Gradient Projection for Robust Parameter Efficient Fine-Tuning
Zhang, Haotian, Liu, Liu, Yu, Baosheng, Qiu, Jiayan, Ren, Yanwei, Liu, Xianglong
The advent of parameter-efficient fine-tuning methods has significantly reduced the computational burden of adapting large-scale pretrained models to diverse downstream tasks. However, existing approaches often struggle to achieve robust performance under domain shifts while maintaining computational efficiency. To address this challenge, we propose Low-rAnk Regulated Gradient Projection (LARGO) algorithm that integrates dynamic constraints into low-rank adaptation methods. Specifically, LARGO incorporates parallel trainable gradient projections to dynamically regulate layer-wise updates, retaining the Out-Of-Distribution robustness of pretrained model while preserving inter-layer independence. Additionally, it ensures computational efficiency by mitigating the influence of gradient dependencies across layers during weight updates. Besides, through leveraging singular value decomposition of pretrained weights for structured initialization, we incorporate an SVD-based initialization strategy that minimizing deviation from pretrained knowledge. Through extensive experiments on diverse benchmarks, LARGO achieves state-of-the-art performance across in-domain and out-of-distribution scenarios, demonstrating improved robustness under domain shifts with significantly lower computational overhead compared to existing PEFT methods. The source code will be released soon.
On Embeddings for Numerical Features in Tabular Deep Learning
Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with gradient boosted decision trees (GBDT) on some GBDT-friendly benchmarks (that is, where GBDT outperforms conventional DL models). We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations.
On Embeddings for Numerical Features in Tabular Deep Learning
Gorishniy, Yury, Rubachev, Ivan, Babenko, Artem
Recently, Transformer-like deep architectures have shown strong performance on tabular data problems. Unlike traditional models, e.g., MLP, these architectures map scalar values of numerical features to high-dimensional embeddings before mixing them in the main backbone. In this work, we argue that embeddings for numerical features are an underexplored degree of freedom in tabular DL, which allows constructing more powerful DL models and competing with GBDT on some traditionally GBDT-friendly benchmarks. We start by describing two conceptually different approaches to building embedding modules: the first one is based on a piecewise linear encoding of scalar values, and the second one utilizes periodic activations. Then, we empirically demonstrate that these two approaches can lead to significant performance boosts compared to the embeddings based on conventional blocks such as linear layers and ReLU activations. Importantly, we also show that embedding numerical features is beneficial for many backbones, not only for Transformers. Specifically, after proper embeddings, simple MLP-like models can perform on par with the attention-based architectures. Overall, we highlight embeddings for numerical features as an important design aspect with good potential for further improvements in tabular DL.
NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining
Lin, Chenguo, Wen, Xumeng, Cao, Wei, Huang, Congrui, Bian, Jiang, Lin, Stephen, Wu, Zhirong
Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into nonoverlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales to highdimensional vectors, we propose a numerically multi-scaled embedding module enumerating all possible scales for the scalar values. The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement against previous representation learning approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods. Despite the phenomenal achievement of large-scale representation learning on various data modalities (Brown et al., 2020; Radford et al., 2021; Caron et al., 2021), the research for time-series representation learning is mostly limited to small-scale datasets without attaining generalization capabilities (Eldele et al., 2021b; Yue et al., 2022; Zhang et al., 2022). Since time-series data may cover a diverse range of domains, such as medical, weather, traffic and more, large-scale training across domains brings special challenges and opportunities for transfer learning. We notice a unique characteristic of time-series data and its representation.
Provably Correct Sensor-driven Path-following for Unicycles using Monotonic Score Functions
Clark, Benton, Hariprasad, Varun, Poonawala, Hasan A.
This paper develops a provably stable sensor-driven controller for path-following applications of robots with unicycle kinematics, one specific class of which is the wheeled mobile robot (WMR). The sensor measurement is converted to a scalar value (the score) through some mapping (the score function); the latter may be designed or learned. The score is then mapped to forward and angular velocities using a simple rule with three parameters. The key contribution is that the correctness of this controller only relies on the score function satisfying monotonicity conditions with respect to the underlying state -- local path coordinates -- instead of achieving specific values at all states. The monotonicity conditions may be checked online by moving the WMR, without state estimation, or offline using a generative model of measurements such as in a simulator. Our approach provides both the practicality of a purely measurement-based control and the correctness of state-based guarantees. We demonstrate the effectiveness of this path-following approach on both a simulated and a physical WMR that use a learned score function derived from a binary classifier trained on real depth images.
Deep Learning Explained : Perceptron – Towards AI
Originally published on Towards AI. Nowadays, frameworks such as Keras, TensorFlow, or PyTorch provide turnkey access to most deep learning solutions without necessarily having to understand them in depth. But this can get problematic as soon as your model is not working as expected. You may need to tweak it yourself. So, if you are here to understand the concept of Perceptron in deep learning, I think you are on the right track if you want to be able to contribute one day to this ecosystem in any way, it is essential to understand the roots of these systems.